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Mininax  Estimation  of  a Multivariate  Normal  Mean  with  Unknown  Covariance  Matrix 

by 

Leon  Jay  Gleser 
Purdue  University 

ABSTRACT 

Let  x be  a p- variate  (p>3)  vector,  normally  distributed  with  un- 
known mean  0 and  unknown  covariance  matrix  E.  Let  W:p*p  be  distributed 
independently  of  x,  and  let  W have  a Wishart  distribution  with  n degrees 
of  freedom  and  parameter  E.  It  is  desired  to  estimate  0 under  the 
quadratic  loss  (6-0) *Q(6-0) , where  Q is  a known  positive  definite  matrix. 
Under  the  condition  that  a lower  bound  for  the  smallest  characteristic 
root  of  Q E is  known,  a family  of  minimax  estimators  is  developed. 

AMS  1970  Subject  classification:  Primary  62  C 99;  secondary  62  F 10, 

62  H 99. 

Key  words  and  phrases:  Multivariate  normal  distribution,  unknown  covariance 

matrix,  estimation  of  mean  vector,  quadratic  loss,  minimax  estimator. 
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Minimax  Estimation  of  a Multivariate  Normal  Mean  with  Unknown  Covariance  Matrix 

by 

Leon  Jay  Gleser 
Purdue  University 

1 . INTRODUCTION 

Let  x:p*l  be  a normally  distributed  random  vector  with  unknown  mean 
9 and  unknown  covariance  matrix  E . Assume  that  we  have  an  independent 

A a 1 - 

estimator  E * n W of  £,  where  W:  pxp  has  a Wishart  distribution  with 
n degrees  of  freedom  and  parameter  E = n-1E(W).  In  the  usual  notation, 

x - N(0,E)  , W * *p(n,E).  (1) 

We  wish  to  estimate  8 with  an  estimator  6(x,W)  subject  to  the  quad- 
ratic loss  function 

L(6,0,E)  - (6-B)'Q(«-0)/tr(QE)  (2) 

Here,  Q is  a known  pxp  positive  definite  matrix,  and  tr(A)  denotes 
the  trace  of  the  matrix  A.  Note  that  tr(QE)  is  just  a normalizing  constant, 
chosen  to  give  the  estimator  5Q(x,W)  = x constant  risk.  It  is  well  known 
that  6q  is  a minimax  estimator  for  this  problem. 

The  limiting  case  of  this  problem  where  E is  completely  known  (cor- 
responding here  to  n *»)  has  recently  received  a good  deal  of  attention. 

(See  Berger  (1]  for  references.]  The  problem  with  E unknown  and  Q * 

E 1 (which  is  not  a special  case  for  our  problem  because  Q * E-1  cannot 
be  known)  has  also  been  studied  by  James  and  Stein  [5],  Lin  and  Tsai 
[6],  Bock  [2],  and  Efron  and  Morris  [3,4],  among  others.  However,  the 
assumption  that  Q ■ E * is  rather  artificial  (it  seems  to  be  motivated 
only  by  invariance  arguments),  and  does  not  seem  to  be  of  practical  im- 
portance. A possibly  more  reasonable  assumption  to  make  relating  Q and  E is 
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that  something  is  known  about  the  characteristic  roots  of  QE.  [Note  that 
if  Q * all  of  the  characteristic  roots  of  QE  are  equal  to  1.]  In  the 

present  paper,  we  assume  that  there  exists  a known  constant  K > 0 such 
that 

chp(QE)  > JC  , all  E > 0,  (3) 

where 

ehjfA)  >_  ch2(A)  >. ...  >_  chp(A) 

denote  the  ordered  characteristic  roots  of  the  pxp  symmetric  matrix  A. 

We  consider  estimators  of  the  form 

6h(x,W)  = (I  -h(x*W'1x)Q’1W'1)x,  (4) 

where  h(u)  is  an  absolutely  continuous  function  on  [0,«).  Our  main  result, 
which  is  proven  in  Section  2,  is  the  following. 

THEOREM  1.  If  (3)  holds,  then  any  estimator  of  the  form  (4)  for  which 
(i)  u h(u)  is  nondecreasing  in  u, 

(ii)  0 < h(u)  <■  2(p-2)(n-p)KfAifn-l).  all  u 2.  0. 
dominates  6c(x,W)  » x in  risk,  and  hence  is  minimax. 

It  is  clearly  of  interest  to  determine  what  happens  to  estimators 
of  the  form  (4)  when  the  bound  (3)  can  be  violated.  In  Section  3 it  is 
sho^n  that  when  (3)  does  not  hold,  no  estimator  of  the  form  (4)  can  be 
minimax.  [Bock  [2]  has  previously  shown  that  for  Q = 1^,  no  estimator 
of  the  form  h(x*W  *x)x  can  be  minimax.]  It  is  conjectured  that  members 
of  a certain  family  (see  (36))  of  estimators  closely  resembling  the 
estimators  (4)  in  form  may  be  minimax,  but  no  proof  of  this  result  is 
given. 

2.  PROOF  OF  THEOREM  1 


j 
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A(0,Z)  = tr(QZ)E[L(6h,e,Z)  - L(6O,0,Z)].  (5) 

Clearly  if  A(0,Z)  < 0,  all  0,  all  Z satisfying  (3),  then  5,  is  minimax 

h 

for  our  problem. 

Using  the  fact  that  a’Qa  - b'Qb  = (a-b)  'Q(a-t-b) , the  fact  that  6Q(x,W) 
x,  and  (4),  we  obtain 

A(0,Z)  = E[h2(x*  W"1x)x'W'1Q"1W~1x]-2E[h(x’  W'1X)x,w"1(x-0)].  (6) 


Note  that  for  any  functions  g(x,W)  for  which  Eg(x,W)  exists,  we  may  write 
E[g(x,W)]  = Ew[Ex|w[g(x,W)]j  = EwjEx  [g(x,W)]J,  (7) 

where  Ex|w[g(x,W)]  denotes  expectation  over  the  conditional  distribution 
of  x given  W,  and  Ew  and  E denote  expectations  over  the  marginal 

w X 

distributions  of  W and  x respectively.  The  last  equality  in  (7)  holds 
since  x and  W are  statistically  independent.  Further,  using  integration 
by  parts  term  by  term  in  the  elements  of  x (with  W treated  as  a fixed 
matrix),  it  can  be  shown  (see  Berger  [1])  that 

E th (x 'nf- Xx) x * W- 1 (x- 0) ] = E rh(x'W-1x)  trW-1  ] +2E  [h(1)  (xfW-1x) 

x'W”1ZW~1x] , (8) 


where  h^(u)  = dh(u)/du.  [Note:  We  are  assuming  that  h(u)  is 

differentiable;  if  not,  a similar  argument,  using  Riemann  integration, 
produces  a corresponding  result;  see  Berger  [1].] 

From  (6) , (7) . and  (8) , we  have 

A(0,Z)  = E[h2(x'W_1x)x'W"1Q'1W"1x-2h(x,W‘1x)trW'1Z-4h(1)(x'W"1x) 

x ,W~1ZW”1x] . (9) 

We  now  find  a canonical  representation  for  (9) . Make  the  change 
of  variables 

y * z‘1/2x,  v * z"1/2wz‘1/2,  (io) 

1/2 

where  Z is  any  square  root  of  Z.  Then 
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y - N(n,  Ip),  V ' *p(n,  Ip), 

•1/2 


(11) 


where  n = E 9.  Further,  y and  V are  statistically  independent.  From 
(9)  and  (10),  with 

Q*  = Z 1/2  QZ1/2, 


and  using  arguments  and  notation  analagous  to  that  used  to  obtain  (7),  we 
have 

A(e,Z)  = Ey  Eyl^fy’V'V)  y»V'1(Q*)'V1y-2h(y'V'1y)trV"1  fl) 


^h^^y'V'Vjy’V'V] 


(12) 


Let  Ty  be  pxp  orthogonal  with  first  row  equal  to  (y'y) ~^2y'.  Let 

u = W ’ = ryQ*ry’-  (13) 

Then,  given  y,  U ~>p(n,  Ip),  so  that  U and  y are  statistically  indepen- 
dent. Partition  U as 


U >11  u2i\ 

\21  U22  j ’ U11'1X1,  U22‘ 


(p-l)x(p-l). 


and  let 


s = “11“U21U22U211’  t = U22 


(14) 


1/2 

where  U„„  is  any  square  root  of  U^.  It  is  well  known  that  s,  t,  and 


22 


U ai>e  statistically  independent,  with 

S " xn-p+l*  1 * N^0,Ip-P*  U22  ~ #p-l(n*  Ip-1') 
Further,  V*1  = P*'U  *r  and 

y y 


(15) 


-1  -1 
U = s 


-t'U"^2 
i u22 


■1/2  * 11-1/2 


■U22  * U22  (Slp-l+tt')U22 


1/2 


(16) 


so  that 


y'V_1y  * s-1y'y»  y'V"2y  s'2y,yti+t,U22t) , 


(17) 
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trV'1  = trU"1  = s_1  (l+t'U^t)  + trU^, 


22’ 


(18) 


and 


^V1  (Q*)~1V~1y  = s'^Xl.-t'u^^CQj'^l.-t’U"^2)'. 


22 


(19) 


Under  the  distributional  assumptions  given  in  (15),  it  is  known  that 
EflJ^)  = (n-p)"1!  j,  so  that 


EtrU‘2  = tr  EU‘2  = (n-p)  1 (p-1) 


For  any  constant  matrix  A, 

E[(l,-t'U22/2)A(l,-t’U‘^/2) '] 


- Eu  Et  <tr 

u22  z 


-tfu-1^2 

r u22 


= Ey  tr 
22 


= tr 


H 0 

yo  u 


-u”1//2t  u-1/2tt'u"1//2' 

U22  Z 22  ZZ  U22 


1 

22; 


rl  0 


<"-p>  Vl, 


(20) 


(21) 


Taking  A = I , the  result  (21)  allows  us  to  verify  that 

E(1  ♦ t'U’Jt)  = (n-p)-1(n-l) . 

Taking  A = (Q^)"1,  the  result  (21)  yields 

E[(l.,-tHj"1/2)  (oYV.-t'U^2)  '] 

•-l/1  0 

- *r«0  1 .1 

^ \ 0 cn-p)  Vi 


(22) 


(23) 


If  in  (12)  we  make  the  change  of  variables  (13)  and  (14),  and  take 
account  of  the  identities  (17),  (18),  and  (19),  then  by  taking  our  ex- 
pected values  in  the  order  E E E . , and  using  (20),  (22),  and  (23), 

y s t»u22 

we  obtain 
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A(0 , L)  = (n-p)_1E  E [h2(s'VV)s'2yV  T(y>Q  ) 
y ^ 


-2h(s_1y'y)s  A (n-1) -2h(s~ Ay 'y) (p-1) 
-4h(-1-)(s  1y'y)s  2y’y(n"1)]» 


-1 


C24) 


where 


T(y»Q*)  = tr(Q^)'1  [Y  J 


P'1, 


= (n-p-l)(y’y)  *y'(Q  )_1y  +tr(Q  ) l.  (25) 

Finally,  integrating  by  parts  in  s,  we  can  show  that 

Esh(s*1y»y)  = (n-p-l)Es [s'1h(s'1y 'y) ] - 2Ests  2y'yh^(s  1y'y)]»(26) 

which,  when  substituted  in  (24),  yields  the  expression 

Me.E)  = (n-p)"1EyEs[h2(s"1y 'y)s’2y 'yT(y,Q  ) - 2p(n-p)  s"1h(s”1y  'y) 

-4(n-p)h1-1^  (s_1y  *y)s"2y  fy] , (27) 

where 


1 -»O.Ip).  * - X 

-1/2  * 1/2  1/2  * 

y and  s are  independent,  n=E  0,  Q = 2 Q£  ' , and  r(y,Q  ) is  given 
by  (25).  The  expression  (27)  is  the  desired  cononical  form. 

Now,  we  are  ready  to  complete  the  proof  of  Theorem  1. 

Let 

r(u)  = uh (u) , (28) 

and  note  that 


(1)(U)  = _ iXi) 


2 * 


(29) 


where  r^(u)  * dr(u)/du.  Substituting  in  (27),  we  obtain 

2(s_1y  V)T(y»Q  ) - 2(p- 
-4(n-p)s*M^  (s”  V*  y)  ] } 


A(0,E)  = (n-p)_1Ey{  (y  »y)’1Es[r2(s'1y  V)T(y,Q  ) - 2(p-2)(n-p)r(s'1y  V) 


< (n-p) -1E  E 
— v y s 


r * 

r_(s  , yryl  (r(y,Q  )r(s~1y'y)  - 2(p-2)(n-p))  , 
y y 


(30) 
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since,  by  assumption  (i)  of  Theorem  1,  r(u)  is  nondecreasing  in  u.  Note 
from  (3)  and  (25)  that 

T(y,Q*)  < (n-DchjtCQ*)"1]  < (n-1)  [chp(QE)]_1 

< (n-1)  K”1.  (31) 

Thus,  applying  assumption  (ii)  of  Theorem  1,  (30),  and  (31),  we  conclude 
that  for  all  satisfying  (3), 

A(9, E)  < 0,  all  0. 

This  completes  the  proof  of  Theorem  1.1. 

We  remark  that  our  proof  actually  demonstrates  the  following. 

THEOREM  2.  Let  an  estimator  6pfx.W)  of  the  form  (4)  satisfy 
(i)  u hfu)  is  nondecreasing  in  u, 

(ii)  0 <,  h£u)  4 2 fp-2)  fn-p)Lui  all  u 2.  0, 
i where  L > 0 is  a.  given  constant.  Then  if  E satisfies 

Cn-p-l)(ch  (QE))'1  ♦ tr(QE)-1  1 L~X,  (32) 

we  have 

Are.E)  £ o,  ail  e. 
and  6|fx.W)  is  minimax. 

Although  Theorem  2 is  more  general  than  Theorem  1,  the  additional 
generality  is  unlikely  to  be  of  practical  importance. 

3.  THE  CASE  WHERE  E IS  COMPLETELY  UNRESTRICTED 
When  E is  unrestricted,  and  (3)  need  not  hold,  then  <5Q(x,W)  is 
essentially  the  only  estimator  of  the  form  (4)  that  can  be  minimax. 
THEOREM  3.  When  E_  is  unrestricted,  no  estimator  of  the  form  6^(x,W)  = 
(Ip-h(x»W  *x)Q~*W  X)x  can  be  minimax  unless  h(u)  = 0 for  almost  all  u .>  0 . 
Proof.  Note  from  (25)  that 

t(y,Q  ) 1 tr(Q  )-1,  for  all  y.  (33) 


Now  from  (33)  and  (27), 
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A(0,E)  > tr(Q*)'1E[h2(s"1y'y)s"2y'y] 

■2 (n-p)E [ps  ^(s’V  y)-2h^^  (s"V  y)s"2y'y]  (34) 

where  the  expected  values  in  (34)  are  easily  shown  to  depend  only  on 
9' ^ 0.  Thus,  if  we  choose  a sequence  { (6^,E^)}  of  parameter  values 

such  that  OIzTV  = c,  all  i,  and 

tr(Q*)_1  - trCI.rV1^.,  as  i -*», 
we  see  that  unless 

E[h2(s'Vy)s'2y*y]  = 0,  all  0’ E-1e=  c,  (35) 

we  will  have  A(0^,Z^)-*«>.  Thus,  for  some  parameter  points  A(0,E)  will 

be  positive  (indeed,  infinitely  large),  and  hence  6,  (x,W)  cannot  be 

h 

minimax.  On  the  other  hand,  it  is  easy  to  show  that  (35)  holds  if  and 
only  if  h(u)  = 0 for  almost  all  u ^ 0.  This  completes  the  proof. 

Estimators  of  the  form  (4)  do  not  perform  well  when  any  linear 
combination  of  the  elements  of  x has  low  variability  (implying  that  chp(E) 
is  small).  To  find  a class  of  minimax  estimators  when  E is  un- 
restricted, we  might  think  of  modifying  members  of  the  class  (4)  to 
produce  new  estimators  of  the  form 

6*(x,W)  = (Ip  - chp(n"1QW)h(x'W'1x)Q"V1)x.  (36) 

Assuming  that  chp(n  *QW)  and  chp(QE)  are  close  in  value  (which  should 
be  true  at  least  when  n is  large),  any  member  of  the  class  (36)  will 
behave  like  the  minimax  estimator  x when  chp(E)  is  small,  and  will  behave 
like  6^  otherwise.  Thus,  we  have  good  intuitive  reasons  for 

conjecturing  that  a member  of  the  class  (36)  of  estimators  is  minimax 
provided  that  (i)  uh(u)  is  nondecreasing  in  u,  and  (ii)  0 <_  h(u)  <_  2(p-2)u‘,' 
all  u ^ 0.  Unfortunately,  we  have  not  yet  been  able  to  prove  this 
conjecture.  One  can  follow  the  steps  used  in  Section  2,  but  unlike  the 
result  (24)  obtained  for  the  class  (4),  integration  over  t and  does 
not  lead  to  any  simplification.  This  lack  of  simplification  is  due  to 
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m* 


the  fact  that  chp(n  1QW),  after  the  change  of  variables  from  (x,W)  to 
(y,s,t,U22),  is  a complicated  and  nonlinear  function  of  y,  s,  t,  and 
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